NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MM-Gen: Principled and Generalizable Data Curation for Enhancing Task Performance in VLMs

Joshi, Siddharth; Nushi, Besmira; Balachandran, Vidhisha; Chandrasekaran, Varun; Vineet, Vibhav; Joshi, Neel; Mirzasoleiman, Baharan (September 2025, Journal of Data-centric Machine Learning Research (DMLR))

Free, publicly-accessible full text available September 25, 2026
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

Li, Shuyue_Stella; Balachandran, Vidhisha; Feng, Shangbin; Ilgen, Jonathan_S; Pierson, Emma; Koh, Pang_Wei; Tsvetkov, Yulia (December 2024, NeurIPS)

Free, publicly-accessible full text available December 1, 2025
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

Ahuja, Kabir; Balachandran, Vidhisha; Panwar, Madhur; He, Tianxing; Smith, Noah_A; Goyal, Navin; Tsvetkov, Yulia (December 2024, Transactions of the Association for Computational Linguistics)

Free, publicly-accessible full text available December 1, 2025
MEDIQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

Li, Shuyue_Stella; Balachandran, Vidhisha; Feng; Feng, Shangbin; Ilgen, Jonathan_S; Pierson, Emma; Koh, Pang_Wei; Tsvetkov, Yulia (December 2024, NeurIPS)

Free, publicly-accessible full text available December 1, 2025
Teaching LLMs to Abstain across Languages via Multilingual Feedback

Feng, Shangbin; Shi, Weijia; Wang, Yike; Ding, Wenxuan; Ahia, Orevaoghene; Li, Shuyue_Stella; Balachandran, Vidhisha; Sitaram, Sunayana; Tsvetkov, Yulia (December 2024, EMNLP)

Free, publicly-accessible full text available December 1, 2025
Resolving Knowledge Conflicts in Large Language Models

Wang, Yike; Feng, Shangbin; Wang, Heng; Shi, Weijia; Balachandran, Vidhisha; He, Tianxing; Tsvetkov, Yulia (October 2024, COLM)

Full Text Available
Fine-grained Hallucination Detection and Editing for Language Models

Mishra, Abhika; Asai, Akari; Balachandran, Vidhisha; Wang, Yizhong; Neubig, Graham; Tsvetkov, Yulia; Hajishirzi, Hannaneh (October 2024, COLM)

Full Text Available
P3Sum: Preserving Author’s Perspective in News Summarization with Diffusion Language Models

Liu, Yuhan; Feng, Shangbin; Han, Xiaochuang; Balachandran, Vidhisha; Park, Chan_Young; Kumar, Sachin; Tsvetkov, Yulia (June 2024, NAACL)

In this work, we take a first step towards designing summarization systems that are faithful to the author’s intent, not only the semantic content of the article. Focusing on a case study of preserving political perspectives in news summarization, we find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries, misrepresenting the intent and perspectives of the news authors. We thus propose P3Sum, a diffusion model-based summarization approach controlled by political perspective classifiers. In P3Sum, the political leaning of a generated summary is iteratively evaluated at each decoding step, and any drift from the article’s original stance incurs a loss back-propagated to the embedding layers, steering the political stance of the summary at inference time. Extensive experiments on three news summarization datasets demonstrate that P3Sum outperforms state-of-the-art summarization systems and large language models by up to 13.7% in terms of the success rate of stance preservation, with competitive performance on standard metrics of summarization quality. Our findings present a first analysis of preservation of pragmatic features in summarization, highlight the lacunae in existing summarization models—that even state-of-the-art models often struggle to preserve author’s intents—and develop new summarization systems that are more faithful to author’s perspectives.
more » « less
Full Text Available
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models

Feng, Shangbin; Shi, Weijia; Bai, Yuyang; Balachandran, Vidhisha; He, Tianxing; Tsvetkov, Yulia (May 2024, International Conference on Learning Representations)

By design, large language models (LLMs) are static general-purpose models, expensive to retrain or update frequently. As they are increasingly adopted for knowledge-intensive tasks, it becomes evident that these design choices lead to failures to generate factual, relevant, and up-to-date knowledge. To this end, we propose Knowledge Card, a modular framework to plug in new factual and relevant knowledge into general-purpose LLMs. We first introduce knowledge cards---specialized language models trained on corpora from specific domains and sources. Knowledge cards serve as parametric repositories that are selected at inference time to generate background knowledge for the base LLM. We then propose three content selectors to dynamically select and retain information in documents generated by knowledge cards, specifically controlling for relevance, brevity, and factuality of outputs. Finally, we propose two complementary integration approaches to augment the base LLM with the (relevant, factual) knowledge curated from the specialized LMs. Through extensive experiments, we demonstrate that Knowledge Card achieves state-of-the-art performance on six benchmark datasets. Ultimately, Knowledge Card framework enables dynamic synthesis and updates of knowledge from diverse domains. Its modularity will ensure that relevant knowledge can be continuously updated through the collective efforts of the research community.
more » « less
Full Text Available
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models

https://doi.org/10.1145/3589334.3645623

Bai, Yuyang; Feng, Shangbin; Balachandran, Vidhisha; Tan, Zhaoxuan; Lou, Shiqi; He, Tianxing; Tsvetkov, Yulia (May 2024, ACM)

Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited knowledge domains, it is not well understood how to evaluate LLMs' knowledge systematically and how well their knowledge abilities generalize, across a spectrum of knowledge domains and progressively complex task formats. To this end, we propose KGQuiz, a knowledge-intensive benchmark to comprehensively investigate the knowledge generalization abilities of LLMs. KGQuiz is a scalable framework constructed from triplet-based knowledge, which covers three knowledge domains and consists of five tasks with increasing complexity: true-or-false, multiple-choice QA, blank filling, factual editing, and open-ended knowledge generation. To gain a better understanding of LLMs' knowledge abilities and their generalization, we evaluate 10 open-source and black-box LLMs on the KGQuiz benchmark across the five knowledge-intensive tasks and knowledge domains. Extensive experiments demonstrate that LLMs achieve impressive performance in straightforward knowledge QA tasks, while settings and contexts requiring more complex reasoning or employing domain-specific facts still present significant challenges. We envision KGQuiz as a testbed to analyze such nuanced variations in performance across domains and task formats, and ultimately to understand, evaluate, and improve LLMs' knowledge abilities across a wide spectrum of knowledge domains and tasks.
more » « less
Full Text Available

« Prev Next »

Search for: All records